Skip to content

feat: add retry layer for push metrics exporters#9036

Merged
rohan-b99 merged 9 commits intodevfrom
rohan-b99/otlp-push-exporter-retries
Mar 25, 2026
Merged

feat: add retry layer for push metrics exporters#9036
rohan-b99 merged 9 commits intodevfrom
rohan-b99/otlp-push-exporter-retries

Conversation

@rohan-b99
Copy link
Copy Markdown
Contributor

@rohan-b99 rohan-b99 commented Mar 19, 2026

Add RetryMetricExporter, which retries up to 3 times with exponential backoff to the apollo metrics and otlp named exporters.


Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • PR description explains the motivation for the change and relevant context for reviewing
  • PR description links appropriate GitHub/Jira tickets (creating when necessary)
  • Changeset is included for user-facing changes
  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Metrics and logs are added3 and documented
  • Tests added and passing4
    • Unit tests
    • Integration tests
    • Manual tests, as necessary

Notes

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. A lot of (if not most) features benefit from built-in observability and debug-level logs. Please read this guidance on metrics best-practices.

  4. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

@rohan-b99 rohan-b99 requested a review from a team as a code owner March 19, 2026 17:46
@apollo-librarian
Copy link
Copy Markdown
Contributor

apollo-librarian bot commented Mar 19, 2026

✅ Docs preview has no changes

The preview was not built because there were no changes.

Build ID: 911ae5a7a132b5fa6fefbf00
Build Logs: View logs


✅ AI Style Review — No Changes Detected

No MDX files were changed in this pull request.

Review Log: View detailed log

This review is AI-generated. Please use common sense when accepting these suggestions, as they may not always be accurate or appropriate for your specific context.

@github-actions

This comment has been minimized.

@rohan-b99 rohan-b99 changed the title Add retry layer for push metrics exporters feat: add retry layer for push metrics exporters Mar 24, 2026
Copy link
Copy Markdown
Contributor

@conwuegb conwuegb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It mostly looks good, I just have a few questions.

//! Retry wrapper for push metric exporters.
//!
//! Wraps a `PushMetricExporter` and retries failed exports a configurable number
//! of times with exponential backoff. Only surfaces the error after all attempts
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what does "exponential backoff" mean here? Is it referring to exponentially longer wait times in between attempts? And if so, why?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right - its a backoff strategy designed to prevent overloading the service with excessive requests if its already at capacity. I checked the OTEL docs to see if it had any recommendations here (originally I didn't think to check) and interestingly it does recommend using exponential backoff with jitter https://opentelemetry.io/docs/specs/otel/protocol/exporter/#retry. I'll go ahead and add the jitter aspect for consistency

Comment thread apollo-router/src/plugins/telemetry/metrics/retry.rs
Comment thread apollo-router/src/plugins/telemetry/metrics/retry.rs
Copy link
Copy Markdown
Contributor

@carodewig carodewig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious how this dovetails (or differs from) the retry mechanism already present in ApolloExporter::submit_report - does that already retry the export, so now there's double retries involved? Or are there now two main retry mechanisms - one for ApolloOtlpExporter and one for ApolloExporter?

This area of the codebases generally confuses me (ie the difference between ApolloExporter and ApolloOtlpExporter) so may not be relevant - but wanted to ask to help with my own understanding!

@rohan-b99
Copy link
Copy Markdown
Contributor Author

I'm curious how this dovetails (or differs from) the retry mechanism already present in ApolloExporter::submit_report - does that already retry the export, so now there's double retries involved? Or are there now two main retry mechanisms - one for ApolloOtlpExporter and one for ApolloExporter?

This area of the codebases generally confuses me (ie the difference between ApolloExporter and ApolloOtlpExporter) so may not be relevant - but wanted to ask to help with my own understanding!

@carodewig I just double checked this as most of this is new to me as well - they do sent different data, ApolloExporter sends our proprietary protobuf-encoded Report struct, whereas ApolloOtlpExporter is for actual OTLP metrics. The backoff logic I see in ApolloExporter seems to deal with a case where studio actually responds to the router telling it to not send reports for a while, whereas the retry logic in this PR should deal with any scenarios where the request to an OTEL collector failed.

Copy link
Copy Markdown
Contributor

@conwuegb conwuegb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rohan-b99 rohan-b99 merged commit 9c14209 into dev Mar 25, 2026
15 checks passed
@rohan-b99 rohan-b99 deleted the rohan-b99/otlp-push-exporter-retries branch March 25, 2026 17:43
@abernix abernix mentioned this pull request Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants